Leadership Statistics in Random Structures 
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The largest component ("the leader") in evolving random structures often exhibits universal 
statistical properties. This phenomenon is demonstrated analytically for two ubiquitous structures: 
random trees and random graphs. In both cases, lead changes are rare as the average number of 
lead changes increases quadratically with logarithm of the system size. As a function of time, the 
number of lead changes is self-similar. Additionally, the probability that no lead change ever occurs 
decays exponentially with the average number of lead changes. 
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Extreme statistics are important in science, engineer- 
ing, and society as they dictate catastrophic events, sys- 
tems robustness, financial indices, etc. The theory of ex- 
treme statistics provides a powerful analysis framework 
and prediction tool 0, Q|- However, it is limited to en- 
sembles of independent random variables. Even though 
most practical applications involve correlated random 
variables, such cases remain largely unexplored @, 0, 0| . 

We investigate extremal characteristics of two basic 
random structures: random trees and random graphs. 
Random trees appear in data storage algorithms in com- 
puter science [g, 0, 111 an d in physical processes such as 
collisions in gases Random graphs 0, ^] have 

numerous applications to theoretical computer science 
0, @, social networks Q3, and physical processes 
such as polymerization |l4| . 

We focus on the largest component, the leader, and 
ask: What is the size of the leader? How does the number 
of lead changes depend on time and system size? What 
is the probability that the leader never changes? Similar 
questions were investigated in growing networks [Tit fill , 
and related leadership statistics were studied in random 
graphs by Erdos and Luczak 

Random trees and random graphs are special cases of 
aggregation processes and hence, we analyze them using 
the rate equation approach [lii,|ljJ|2(J,|2l|]. Characteriza- 
tion of leadership statistics is sensible only for finite sys- 
tems. We thus consider large yet finite systems for which 
the rate equation approach yields the leading asymptotic 
dependence on the system size |23|, |24j . 

Our main result is that the total number of lead 
changes L grows as L(N) ~ [lnJV] 2 with the system size 
N. This as well as other leadership statistics are univer- 
sal as they characterize both random trees and random 
graphs. The time dependent number of lead changes 
L(t, N) attains the scaling form (In TV) 2 F(x) with the 
scaling variable x = In fc* / In N where fc* is the typical 
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component size. Additionally, the probability that no 
lead change ever occurs decays as e~ L . 

We start with the simpler case of random trees. These 
are generated according to the following procedure. Ini- 
tially, the system consists of N single-leaf trees. Then, 
two trees are picked at random and attached to a com- 
mon root. This merging process is repeated until a single 
tree with N leafs emerges. We treat the merging pro- 
cess dynamically. Let n be the number of trees. In the 
thermodynamic limit, the normalized density c = n/N 



evolves according to 



dt 



-c 2 because in every merger 



two trees are lost and one is gained (for convenience, 
the merger rate is set to unity). Subject to the initial 
condition c(0) = 1 the density is c(t) = (1 + i) -1 and 
given the simple relations between the number of trees 
n = N(l + t) , the average size m = 1 + 1, and time t, 
we state our results in terms of time. 

The size distribution is obtained similarly. Let nk be 
the number of trees with k leafs. The normalized density 
c k = n k/N evolves according to the Smoluchowsky rate 
equation = Yli+j=k c « c j — ^ cc k with the monodis- 
perse initial conditions Cfe(O) = Sk.i- The rate equation 
reflects the fact that trees are merged randomly, indepen- 
dent of their size. It can be solved (using the generating 
functions technique for example) to give |18lll9| 



j-fc-i 



Cfc(t) 



(1 + *) 



fc+i 



(1) 



In the long time limit, the size distribution attains a sim- 
ple self-similar behavior 



c k (t) ~ KH(k/K), $(z) = e 



(2) 



with the typical size fc* ~ t. 

What is the average size of the largest tree (the leader)? 
Using the size distribution, we can answer an even more 
general question. Let l r (t) be the average size of the r- 
largest tree with the leader I = l\. From the cumulative 



distribution Uk = 



j>k n J 



= Nt- l [t/{\ + t)] k and the 



relation ui r = r, the size of the rth leader is 
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ir(*, N) 



ln[iV/rt] 
ln[(l + t)/t] 



(3) 
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FIG. 1: The normalized time dependence of the number of 
lead changes for random trees, L(t, N)/L(N), versus the scal- 
ing variable x — \nt/\nN. The simulation data, representing 
an average over 10 3 Monte Carlo runs, is compared with the 
theoretical prediction 2F(x) — 2x — x 2 . 



There are two regimes of behavior. In the short time 
limit, t < 1, one has l r (t,N) = 1 + \n[N/r]/ ln[l/i]. 
Moreover, from ~ Nt k ~ x the first dimer appears at 
t2 — AT -1 ; the first trimer appears at £3 ~ A -1 / 2 and 
then, there are of the order N 1 / 2 dimers, so this trimer 
results from the leading dimer with probability of the or- 
der A -1 / 2 . Since almost every lead change introduces a 
new leader, the leader grows in increments of unity. At 
the crossover point, (w 1, the size of the leader varies 
logarithmically with the system size, ((tw 1, N) ~ IniV. 
In the long time limit, t ^> 1, the size of the leader grows 
linearly (up to a logarithmic correction) with time 

/ r (i,A)~tln^. (4) 
rt 

What is the average size l w of the winner (the last 
emerging leader)? At what time t w does the winner 
emerge? Both quantities grow linearly with A: l w ~ aN 
and t w ~ [3N. The curve a — —(3 In [3 obtained from 
Eq. J3J has an extremum at a = (3 = e _1 = 0.36788 
thereby implying the bounds: a,f3< e _1 . 

How many lead changes L{t 1 N) occur as a function 
of time? As a function of system size? What is the 
total number of lead changes L(N) = L(t — 00, N)? 
In our definition, a lead change occurs when two trees 
(none of which is the leader) merge to become larger 
than the leader. For short times, i < 1, we noted that 
L(t,N) = l(t,N) — 1. For long times, f > 1, consider 
the cumulative distribution Uk — Nt^ 1 exp(— fc/i). Its 
growth rate immediately gives the rate by which the 
leader is surpassed, j^L(t,N) — gjUfcL.- As u; ~ 1, 
we have j- t L{t,N) ~ It" 2 ~ i" 1 In ^ from which the 




FIG. 2: The total number of lead changes L(N) versus the sys- 
tem size N. Shown are simulation results for Random Trees 
(RT) and Random Graphs (RG) representing an average over 
10 4 realizations. 

time-dependent number of lead changes is 

L(t,N) ~ InilniV- ^(\nt) 2 . (5) 

Interestingly, this quantity obeys the scaling form 

L(t,N) = (\nN) 2 F(x), (6) 

with the quadratic scaling function: F(x) = x — \x 2 . 
The scaling variable is unusual: a ratio of logarithms, 
in contrast with the ordinary ratio underlying the size 
distribution (J2J. 

To check these theoretical predictions, we performed 
large-scale Monte Carlo simulations. In the simulations, 
randomly chosen trees are merged repeatedly. Keeping 
track of the leader and averaging over many independent 
realizations, we observe a scaling behavior that is consis- 
tent with Eq. JBJ). However, as a function of the system 
size, the convergence is slow because the scaling variable 
involves logarithms. 

We briefly mention a neat alternative derivation of the 
time-dependent behavior. The number of lead changes 
can be obtained from j^L = (At)^ 1 , where the time in- 
terval between successive lead changes At is estimated 
from li(t) = h{t + At). This approach confirms the scal- 
ing form © with F(x) = (x — ^x 2 )/ln2, i.e., there is a 
factor 1/ In 2 discrepancy. 

The time dependent behavior can be used to obtain 
the total number of lead changes. Substituting t w = [3N 
into Eq. J^J gives 

L(N) ~ A(lnN) 2 (7) 

with A — F(l) = 1/2. Both the leading asymptotic 
behavior and the In N correction are confirmed numer- 
ically. Moreover, the numerical prefactor A = 0.50(1) 
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FIG. 3: The survival probability of the initial leader S(N) 
versus the system size N. The number of realizations was 10 10 
and 10 8 for random trees and random graphs, respectively. 



agrees with the theoretical prediction (Fig. 2). Since 
L(t w 1,N) ~ IniV, the majority of lead changes occur 
when t^>l. 

How is the number of lead changes distributed? What 
is the probability that no lead change occur? Let P n (t,N) 
be the probability that n lead changes occur till time t. 
The flux surpassing the leader characterizes the evolution 
of the probability distribution jj^Pn — (^^) [Pn-i — Pn]- 
With the initial condition P n (0,N) = S nfi , the distribu- 
tion is Poissonian 



Pn(t, N) 



W,NT „-L(t,N) 



(8) 



Consequently, the distribution of the total number of 
lead changes P n (N) = P n {t — oo,N) is also Poissonian: 
p n (N) = ±£e- L with L = L(N) given by Eq. J7J). 
Hence, the variance in the number of lead changes o~(N) 
grows as cr(iV) ~ \/A\nN. Furthermore, the probabil- 
ity that no lead change occur (the survival probability 
of the first leader) S(N) = Po(N) decays faster than a 
power-law but slower than a stretched exponential 



S(N) = exp[-L] ~ cxp [-A(lnN) 2 ] 



(9) 



The asymptotic A^-dependence is confirmed numerically 
(Fig. 3). 

We now consider graphs, grown randomly as follows. 
Initially, the system consists of N single-node graphs. 
Then, two nodes are picked at random and a link is drawn 
between them. If they belong to two distinct graphs, the 
two become one. This process is repeated indefinitely. 
Let rife be the number of graphs of size k. The normalized 
density Ck — rik/N evolves according to the rate equa- 
tion JjCfe = i Yli+j=k iJ c i c j — kck with the monodisperse 
initial conditions Cfe(0) = Sk i. The rate equation reflects 
the fact that two graphs are connected with rate propor- 
tional to the product of their sizes. This equation can be 



solved using generating functions to give |20j, |2 



Cfc(t) 



(fct) 



fc-i 



-hi 



k-k\ 



(10) 



At time t = 1, the system undergoes a gelation transi- 
tion: it develops a giant component that eventually takes 
over the entire mass in the system. Close to this gelation 
time the size distributions attains the scaling behavior 



Cfe(i) ~ fc* 5/2 $(fc/fc„), $(x)cxx 



-5/2 e -x/2^ 



(11) 



The typical size diverges, fc„ ~ (1 — t)~ 2 , as t — > 1. 
At the gelation time, the size distribution has an al- 
gebraic tail Ck(t = 1) ~ fc~ 5 / 2 . Hence, the cumulative 
distribution is Uk ~ Nk~ 3 / 2 and the criterion ui iu ~ 1 
gives the average size of the giant component (the win- 
ner), i w - n 2 / 3 The time at which it emerges is 
1 - *„ ~ N-V 3 . 

We estimate the size of the leader from the cumu- 
lative distribution, it; = 1. For t 1, the size of 
the leader l(t,N), the number of lead changes L(t,N), 
as well as the number of distinct leaders are all ap- 
proximately equal and the same as for random trees. 
The number of lead changes is of the order lnA^ in 
this phase; furthermore, L(t,N) ~ lnA^ for t < 1. 
The behavior near the gelation time 1 — t <C 1 is 
a bit more interesting. From the large-fc behavior, 
Mfc ~ N(l - t)- 2 k- 5 / 2 exp[-fc(l - t) 2 /2], and u t = 1 we 
arrive at following implicit relation for the leader l(t, N): 
I ~ 2(1 - t)~ 2 \xiN — 3(1 - t)~ 2 In/. Inserting the ze- 
roth order approximation l^°> = 2(1 — t)~ 2 hiN into Inl 
on the right-hand side of the above relation and ignoring 
lnlnAf terms yields the leader size 



I 



(i-ty 



ln[AT(l - tf 



(12) 



The rate by which the leader changes is estimated from 
1(1- t). Substituting Eq. JT^J) and 



Tt U k\k=l 



integrating, the number of lead changes is 



L(t,N) ~ 2 In N In 



1 



1 - t 



- 3 



hi- 



1 



-1 2 



1 -t 



It attains the scaling form 



L(t,N) ~ (InN) 2 F(x), x 



In AT 



(13) 



(14) 



with the scaling function F(x) = 2x — 3x 2 . As this be- 
havior holds up to time t w , where 1 — t w ~ N^ 1 / 3 , the 
total number of lead changes grows according to Eq. Q 
with A = F(l/3) = 1/3. Furthermore, the distribution 
of lead changes is Poissonian as in (JSJ and the survival 
probability decays according to (jHJ. 

As terms of the order In In N/ In A^ were neglected, the 
scaling behavior and the leading asymptotic behavior 
may be realized only for extremely large systems. More- 
over, the computational cost of random graph simula- 
tions is larger because graphs are chosen with proba- 
bility proportional to their size. Nevertheless, we can 
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confirm the predicted system size dependence of L(N) 
(Fig. 2) and S(N) (Fig. 3) numerically. The prefactor 
A = 0.20(2) is lower than the theoretical value A = 1/3, 
perhaps due to the slow convergence. 

Let us compare random trees and random graphs. 
They seem very different, e.g., the gelation transition oc- 
curs in one case but not in the other. Yet, they exhibit 
similar extremal characteristics. In both cases, the to- 
tal number of lead changes L(N) grows as [In TV] 2 and 
the survival probability decays as exp[— L]. Moreover, 
even the seemingly distinct temporal characteristics can 
be reconciled, e.g., in both cases the size distribution at- 
tains the scaling form Ck(t) oc <&(fc/fc*) when t — ► oo (for 
random trees) and t — ► 1 (for random graphs). Of course, 
the actual time dependence of the typical scale is differ- 
ent: fc* ~ t and £;* ~ (1 — t)~ 2 , respectively. The size 
of the leader can be rewritten as I ~ fc* ln[iV/fc*] with 
7=1 and 3/2, respectively. Furthermore, Eqs. iJBJ and 
(|14H can be reconciled by writing the scaling variable in 
the unified form x — Ink*/ In N. 

We now restrict our attention to the survival probabil- 
ity S(N) and supplement the leading behavior @ with 
rigorous bounds. Consider random trees. A lower bound 
for S(N) is obtained from a greedy scenario in which 
all merger events involve the leader till it reaches size 
N/2. The probability that the second merger involves 
the leading dimer is (N — the probability that the 

third merger involves the leading trimer is (N — 2) _1 ; 
etc. Thus, the greedy scenario is realized with proba- 
bility rij<Af/2 lv-7' thereby providing the lower bound 
S(N) > ffi^i ■ An upper bound can be obtained by 
estimating the number of trees the size of the leader. 
There are of the order N 1 / 2 dimers when the first trimer 
is born, n 2 (t3 = A -1 / 2 ) = TV 1 / 2 . The leading dimer 
retains the lead with probability inversely proportional 
to the number of dimers, N~ x / 2 . Similarly, this leading 
trimer retains the lead with probability proportional to 
TV" 1 / 3 . Therefore the upper bound Oj<in n N is es- 
timated as N~ lnlnN (the cutoff j < InJV is dictated by 
the size of the leader at the crossover time t as 1). Hence, 
the survival probability obeys 

N_ 

(^f) 2 <^)< ex PH ln ^M lnlnA % ( 15 ) 

where the lower bound was simplified using the Stirling 
formula. Note that the upper bound merely assures that 
the lead never changes in the early phase t < 1 when the 
average number of lead changes is only In TV. 

For random graphs, the greedy scenario is again sim- 
ple to analyze since the probability that in a sys- 
tem with a leader of size j and N — j monomers the 
probability that the next merger involves the leader 



is Pj = [j(N - j)]/\j(N - j) + i (N - j)(N - j - 1)] or 
Pj = (2j)/( N + J - !)■ Tnc product U. j<N/2 Pj provides 
the lower bound. Asymptotically, the lower bound decays 
as X N with A = (2/3) 3 / 2 = 0.544331 .... On the other 
hand, repeating the above argument yields the same up- 
per bound, so 

X N < S(N) < exp [-(In N) ■ (InlnJV)] . (16) 

The upper bound is again much closer to the actual 
asymptotic behavior. 

In conclusion, random graphs and random trees exhibit 
similar leadership characteristics. As in random growing 
networks pl| , lead changes are infrequent given that the 
overall number of lead changes increases only logarithmi- 
cally with the system size. The time dependent number 
of lead changes approaches a self-similar form asymptot- 
ically. The convergence to the asymptotic behavior is 
much slower for extremal statistics compared with size 
statistics due to the various logarithmic dependences. 
Consequently, the asymptotic behavior may difficult to 
detect in practice, especially for random graphs. 

To obtain the extremal characteristics, we employed 
the scaling behavior of the size distribution outside the 
scaling regime, namely, at sizes much larger than the typ- 
ical size where, at least formally, statistical fluctuations 
can no longer be ignored. Interestingly, the resulting sys- 
tem size dependence for the various leadership statistics 
appears to be asymptotically exact. Further analysis is 
needed to illuminate the role of statistical fluctuations, 
for example by characterizing corrections to the leading 
behavior. 

The virtue of the rate equation approach to analyzing 
extremal characteristics is its simplicity, robustness, and 
generality. It applies to general aggregation processes 
where the merger rate may depend in a complicated man- 
ner on the aggregate size or in situations where there is 
an underlying spatial structure. We find that the above 
leadership statistics extend to algebraic merger rates as 
well as to aggregation in one spatial dimension. This 
method is also applicable to other extremal features in- 
cluding for example laggard (smallest component) statis- 
tics. In the case of random trees, for instance, the total 
number of laggard changes grows logarithmically with 
the system size. 
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